Using differentially expressed genes from pre/post exercise human muscle, SignatureSearch was used to find drugs of similar perturbation signatures as possible candidates for Cortes R01 prelim data analysis for “exercise mimetics” for Alzheimers Disease.
Dataset:
GitHub:
System which operations were done on: MacBook Pro (16-inch, 2019), Processor 2.4 GHz 8-Core Intel Core i9, Memory 64 GB 2667 MHz DDR4, Graphics AMD Radeon Pro 5300M 4 GB Intel UHD Graphics 630 1536 MB
Contents - CMAP Method
- LINCS Method
#reference database
eh <- ExperimentHub()
## snapshotDate(): 2020-10-27
cmap <- eh[["EH3223"]]; cmap_expr <- eh[["EH3224"]]
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
Load in data
#DESeq2 results
human_deseq2_res <- read.csv("210518_Human_DESeq_Res.csv", header = TRUE, row.names = 1)
#LFC non NA genes
human_LFC <- human_deseq2_res[!is.na(human_deseq2_res$log2FoldChange),]
#Get rid of specific transcript .X numbers
rownames(human_LFC) <- sub("\\..*", "", rownames(human_LFC))
Find up and down genes
#showing up empty for LFC > 2 and p-adj < 0.05, so changed LFC to 1.5
human_up_df <- human_LFC[human_LFC$log2FoldChange >1 & human_LFC$padj <0.05,]
human_up_list <- rownames(human_up_df)
# Jen used mapIds but it sounds like it could be the same as select(), but mapIds gives error: Error in .processFilterParam(keys, x) : 'filter' has to be an 'AnnotationFilter', a list of 'AnnotationFilter' object, an 'AnnotationFilterList' or a valid filter expression!
human_up_list <- mapIds(EnsDb.Hsapiens.v75, keys = human_up_list, column="ENTREZID", keytype="GENEID", multiVals="first")
## Warning: Unable to map 138 of 252 requested IDs.
#correlation might be better method because the LFC?
human_down_df <- human_LFC[human_LFC$log2FoldChange < -1 & human_LFC$padj <0.05,]
human_down_list <- rownames(human_down_df)
human_down_list <- mapIds(EnsDb.Hsapiens.v75, keys= human_down_list , column="ENTREZID", keytype="GENEID", multiVals="first")
## Warning: Unable to map 42 of 54 requested IDs.
(STRAIGHT FROM JEN’S MARKDOWN )Lamb et al. (2006) introduced the gene expression-based search method known as Connectivity Map (CMap) where a GES database is searched with a query GES for similar entries (Lamb et al. 2006). Specifically, the GESS method from Lamb et al. (2006), here termed as CMAP, uses as query the two label sets of the most up- and down-regulated genes from a genome-wide expression experiment, while the reference database is composed of rank transformed expression profiles (e.g. ranks of LFC or z-scores). The actual GESS algorithm is based on a vectorized rank difference calculation. The resulting Connectivity Score expresses to what degree the query up/down gene sets are enriched on the top and bottom of the database entries, respectively. The search results are a list of perturbagens such as drugs that induce similar or opposing GESs as the query. Similar GESs suggest similar physiological effects of the corresponding perturbagens.
Function qSig() builds an object to store the query signature, reference database and GESS method used for GESS methods
qsig_cmap_human <- qSig(query = list(upset=as.character(human_up_list), downset=as.character(human_down_list)),
gess_method="CMAP", refdb= "cmap")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
## 59 / 252 genes in up set share identifiers with reference database
## 6 / 54 genes in down set share identifiers with reference database
cmap <- gess_cmap(qSig= qsig_cmap_human, chunk_size=5000)
cmap_res <- result(cmap)[1:50,]
write.csv2(cmap_res, file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/output/SignatureSearch/210518_cmap_res_human_revDESeq2contrasts.csv")
cmap_res
## # A tibble: 50 x 10
## pert PCID cell type trend raw_score scaled_score N_upset N_downset
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <int> <int>
## 1 SB-202190 53539… MCF7 trt_… up 0.859 1 59 6
## 2 pilocarpine 5909 PC3 trt_… down -0.808 -1 59 6
## 3 doxorubicin 31703 PC3 trt_… down -0.797 -0.987 59 6
## 4 tropine <NA> MCF7 trt_… down -0.786 -0.974 59 6
## 5 etanidazole 3276 HL60 trt_… down -0.780 -0.965 59 6
## 6 dirithromy… 64738… MCF7 trt_… down -0.779 -0.964 59 6
## 7 nimesulide 4495 MCF7 trt_… down -0.763 -0.944 59 6
## 8 scopolamin… <NA> MCF7 trt_… down -0.760 -0.941 59 6
## 9 N-acetyl-L… <NA> HL60 trt_… down -0.754 -0.933 59 6
## 10 adrenoster… 69273… HL60 trt_… down -0.750 -0.928 59 6
## # … with 40 more rows, and 1 more variable: t_gn_sym <chr>
#JEN : This table contains the search results for each perturbagen (here drugs) in the reference database ranked by their signature similarity to the query. For the CMAP method, the similarity metrics are raw_score and scaled_score. The raw score represents the bi-directional enrichment score (Kolmogorov-Smirnov statistic) for a given up/down query signature. Under the scaled_score column, the raw_score has been scaled to values from 1 to -1 by dividing positive scores and negative scores with the maximum positive score and the absolute value of the minimum negative score, respectively. The remaining columns in the search result table contain the following information. pert: name of perturbagen (e.g. drug) in the reference database; cell: acronym of cell type; type: perturbation type, e.g. compound treatment is trt_cp; trend: up or down when reference signature is positively or negatively connected with the query signature, respectively; N_upset or N_downset: number of genes in the query up or down sets, respectively; t_gn_sym: gene symbols of the corresponding drug targets.
Visualize GESS Results
#drugs_top15 <- unique(cmap_res$pert)[1:15]
drugs_top30 <- unique(result(cmap)$pert)[1:30]
gess_res_vis(cmap_res, drugs = drugs_top30, col = "scaled_score")
TSEA
The following introduces how to perform TSEA on drug-based GESS results using as functional annotation systems GO, KEGG and Reactome pathways. … The specialized enrichment algorithms include Duplication Adjusted Hypergeometric Test (dup_hyperG), Modified Gene Set Enrichment Analysis (mGSEA) and MeanAbs (mabs).
Internally, the latter converts the drug set to a target set, and then computes for it enrichment scores for each MF GO term based on the hypergeometric distribution. The enrichment results are stored in a feaResult object. It contains the organism information of the annotation system, and the ontology type of the GO annotation system. If the annotation system is KEGG, the latter will be “KEGG”. The object also stores the input drugs used for the enrichment test, as well as their target information.
tsea_kegg_cmap_human <- tsea_dup_hyperG(drugs = drugs_top30[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 10 drugs:
## tropine / etanidazole / dirithromycin / scopolamine n-oxide / n-acetyl-l-leucine / adrenosterone / cytochalasin b / pentetrazol / alfadolone / adiphenine
## Loading required package: org.Hs.eg.db
##
## Reading KEGG annotation online:
##
## Reading KEGG annotation online:
result(tsea_kegg_cmap_human)
## # A tibble: 192 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa00… Nitrogen me… 6/53 17/8112 6.81e-10 1.41e-7 6.09e-8 759/23… 6
## 2 hsa04… Adrenergic … 10/53 150/81… 3.42e- 8 3.53e-6 1.53e-6 207/56… 10
## 3 hsa04… Growth horm… 9/53 119/81… 5.98e- 8 4.13e-6 1.78e-6 207/29… 9
## 4 hsa05… Lipid and a… 11/53 215/81… 1.01e- 7 4.16e-6 1.80e-6 207/29… 11
## 5 hsa00… Steroid hor… 7/53 61/8112 1.12e- 7 4.16e-6 1.80e-6 1571/1… 7
## 6 hsa04… Relaxin sig… 9/53 129/81… 1.21e- 7 4.16e-6 1.80e-6 207/56… 9
## 7 hsa05… Yersinia in… 9/53 137/81… 2.03e- 7 5.28e-6 2.28e-6 207/29… 9
## 8 hsa04… AGE-RAGE si… 8/53 100/81… 2.26e- 7 5.28e-6 2.28e-6 207/56… 8
## 9 hsa04… Fc epsilon … 7/53 68/8112 2.41e- 7 5.28e-6 2.28e-6 207/24… 7
## 10 hsa04… MAPK signal… 12/53 294/81… 2.89e- 7 5.28e-6 2.28e-6 207/56… 12
## # … with 182 more rows
cmap_res_down <- cmap_res[cmap_res$trend =="down",]
cmap_res_up <- cmap_res[cmap_res$trend =="up",]
down_drug_cmap <- unique(cmap_res_down$pert)[1:20]
up_drug_cmap <- unique(cmap_res_up$pert)[1:20]
gess_res_vis(cmap_res, drugs = up_drug_cmap, col = "scaled_score")
dtnetplot(drugs = drugs(tsea_kegg_cmap_human), set = "hsa00140",
desc="Steroid hormone biosynthesis")
## No targets found in DrugBank/CLUE/STITCH database for 10 drugs:
## tropine / etanidazole / dirithromycin / scopolamine n-oxide / n-acetyl-l-leucine / adrenosterone / cytochalasin b / pentetrazol / alfadolone / adiphenine
#u).
DSEA
Instead of translating ranked lists of drugs into target sets, as for TSEA, the functional annotation categories of the targets can be assigned to the drugs directly to perform Drug Set Enrichment Analysis (DSEA) instead. Since the drug lists from GESS results are usually unique, this strategy overcomes the duplication problem of the TSEA approach. This way classical enrichment methods, such as GSEA or tests based on the hypergeometric distribution, can be readily applied without major modifications to the underlying statistical methods. As explained above, TSEA and DSEA performed with the same enrichment statistics are not expected to generate identical results. Rather they often complement each other’s strengths and weaknesses.
dsea_res_cmap <- dsea_hyperG(drugs = drugs_top30[1:20], type = "KEGG",
pvalueCutoff = 0.05, qvalueCutoff = 0.05,
minGSSize = 10, maxGSSize = 2000)
result(dsea_res_cmap)
## # A tibble: 3 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa05… Chemical car… 4/10 732/20… 2.85e-4 0.0424 0.0343 pilocarp… 4
## 2 hsa00… Metabolism o… 4/10 869/20… 5.49e-4 0.0424 0.0343 pilocarp… 4
## 3 hsa00… Steroid horm… 4/10 906/20… 6.43e-4 0.0424 0.0343 pilocarp… 4
write.csv2(result(dsea_res_cmap), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs//210526_dsea_cmap_res_human.csv")
#five top pathways KEGG upregulated or downregulated, KEGG usually has maps that relate to pathways, but some geens are linked to multiple pathways
#Are there genes that are linked to linked to multiple KEGG pathways?
dtnetplot(drugs = drugs(dsea_res_cmap), set = "hsa00140",
desc="Steroid hormone biosynthesis")
## No targets found in DrugBank/CLUE/STITCH database for 10 drugs:
## tropine / etanidazole / dirithromycin / scopolamine n-oxide / n-acetyl-l-leucine / adrenosterone / cytochalasin b / pentetrazol / alfadolone / adiphenine
LINCS Method
Subramanian et al. (2017) introduced a more complex GESS algorithm, here referred to as LINCS. While related to CMAP, there are several important differences among the two approaches. First, LINCS weights the query genes based on the corresponding differential expression scores of the GEPs in the reference database (e.g. LFC or z-scores). Thus, the reference database used by LINCS needs to store the actual score values rather than their ranks. Another relevant difference is that the LINCS algorithm uses a bi-directional weighted Kolmogorov-Smirnov enrichment statistic (ES) as similarity metric.
qsig_lincs_human <- qSig(query =list(upset=as.character(human_up_list), downset=as.character(human_down_list)),
gess_method="LINCS", refdb="lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
## 59 / 252 genes in up set share identifiers with reference database
## 6 / 54 genes in down set share identifiers with reference database
The similarity scores stored in the LINCS result table are summarized here. WTCS: Weighted Connectivity Score; WTCS_Pval: nominal p-value of WTCS; WTCS_FDR: false discovery rate of WTCS_Pval; NCS: normalized connectivity score; NCSct: NCS summarized across cell types; Tau: enrichment score standardized for a given database. The latter is only included in the result table if tau=TRUE in a gess_lincs function call. The example given is run with tau=FALSE, because the tau values are only meaningful when the complete LINCS database is used which is not the case for the toy database.
The following provides a more detailed description of the similarity scores computed by the LINCS method. Additional details are available in the Supplementary Material Section of the Subramanian et al. (2017) paper.
WTCS: The Weighted Connectivity Score is a bi-directional ES for an up/down query set. If the ES values of an up set and a down set are of different signs, then WTCS is (ESup-ESdown)/2, otherwise, it is 0. WTCS values range from -1 to 1. They are positive or negative for signatures that are positively or inversely related, respectively, and close to zero for signatures that are unrelated.
WTCS_Pval and WTCS_FDR: The nominal p-value of the WTCS and the corresponding false discovery rate (FDR) are computed by comparing the WTCS against a null distribution of WTCS values obtained from a large number of random queries (e.g. 1000).
NCS: To make connectivity scores comparable across cell types and perturbation types, the scores are normalized. Given a vector of WTCS values w resulting from a query, the values are normalized within each cell line c and perturbagen type t to obtain the Normalized Connectivity Score (NCS) by dividing the WTCS value by the signed mean of the WTCS values within the subset of signatures in the reference database corresponding to c and t.
NCSct: The NCS is summarized across cell types as follows. Given a vector of NCS values for perturbagen p, relative to query q, across all cell lines c in which p was profiled, a cell-summarized connectivity score is obtained using a maximum quantile statistic. It compares the 67 and 33 quantiles of NCSp,c and retains whichever is of higher absolute magnitude.
Tau: The standardized score Tau compares an observed NCS to a large set of NCS values that have been pre-computed for a specific reference database. The query results are scored with Tau as a standardized measure ranging from 100 to -100. A Tau of 90 indicates that only 10% of reference perturbations exhibit stronger connectivity to the query. This way one can make more meaningful comparisons across query results.
lincs_human <- gess_lincs(qsig_lincs_human, sortby="NCS", tau=TRUE)
result(lincs_human)[1:50,]
## # A tibble: 50 x 15
## pert PCID cell type trend WTCS WTCS_Pval WTCS_FDR NCS Tau
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 BRD-K863366… 546541… PHH trt_… down -0.617 0.0000167 9.53e-5 -1.77 -99.9
## 2 BRD-K904461… 445011… PC3 trt_… up 0.641 0.0000156 9.53e-5 1.75 99.4
## 3 PD-173074 1401 PHH trt_… up 0.619 0.0000166 9.53e-5 1.73 99.3
## 4 garcinol 737075… NPC trt_… down -0.609 0.0000170 9.53e-5 -1.73 -99.5
## 5 GW-3965 447905 SKB trt_… up 0.636 0.0000158 9.53e-5 1.73 99.4
## 6 BRD-A391216… 2793096 MCF7 trt_… down -0.627 0.0000162 9.53e-5 -1.71 -99.6
## 7 ASN-05257430 650361 A549 trt_… down -0.608 0.0000171 9.53e-5 -1.71 -99.9
## 8 thiethylper… 5440 MCF7 trt_… down -0.616 0.0000167 9.53e-5 -1.68 -100.
## 9 geranylgera… 5281365 MCF7 trt_… down -0.616 0.0000167 9.53e-5 -1.68 -99.7
## 10 BRD-A741349… 3950572 HA1E trt_… down -0.632 0.0000160 9.53e-5 -1.68 -99.8
## # … with 40 more rows, and 5 more variables: TauRefSize <int>, NCSct <dbl>,
## # N_upset <int>, N_downset <int>, t_gn_sym <chr>
lincs_res <- result(lincs_human)
write.csv(lincs_res, file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210526_lincs_human_res.csv")
drugs_lincs_top <- unique(lincs_res$pert)[1:20]
gess_res_vis(lincs_res, drugs = drugs_lincs_top, col = "NCS")
lincs_res_down <- lincs_res[lincs_res$trend == "down",]
down_drug_lincs <- unique(lincs_res_down$pert)[1:20]
lincs_res_up <- lincs_res[lincs_res$trend == "up",]
up_drug_lincs <- unique(lincs_res_up$pert)[1:20]
drugs_lincs_topdown <- unique(lincs_res_down$pert)[1:20]
gess_res_vis(lincs_res, drugs = drugs_lincs_topdown , col = "NCS")
tsea_lincs_down <- tsea_dup_hyperG(drugs = drugs_lincs_topdown[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 12 drugs:
## brd-k86336638 / garcinol / brd-a39121618 / asn-05257430 / geranylgeraniol / brd-a74134989 / brd-k31302860 / rhapontin / brd-k93410934 / brd-k36591038 / brd-a02241946 / sinensetin
result(tsea_lincs_down)
## # A tibble: 50 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa05… Nicotine a… 22/53 40/8112 5.67e-40 5.27e-38 4.12e-38 2555/2… 22
## 2 hsa04… Neuroactiv… 26/53 341/81… 2.22e-22 1.03e-20 8.07e-21 1812/1… 26
## 3 hsa05… Morphine a… 16/53 91/8112 1.69e-19 5.23e-18 4.09e-18 1812/2… 16
## 4 hsa04… GABAergic … 15/53 89/8112 5.25e-18 1.22e-16 9.54e-17 2555/2… 15
## 5 hsa04… Retrograde… 16/53 148/81… 5.52e-16 1.03e-14 8.02e-15 2555/2… 16
## 6 hsa04… Taste tran… 10/53 86/8112 1.41e-10 2.19e- 9 1.71e- 9 2555/2… 10
## 7 hsa05… Cocaine ad… 8/53 49/8112 7.04e-10 9.35e- 9 7.30e- 9 1812/2… 8
## 8 hsa05… Amphetamin… 8/53 69/8112 1.18e- 8 1.37e- 7 1.07e- 7 1812/2… 8
## 9 hsa04… Glutamater… 8/53 114/81… 6.22e- 7 6.43e- 6 5.02e- 6 2902/2… 8
## 10 hsa04… cAMP signa… 10/53 216/81… 1.04e- 6 9.69e- 6 7.57e- 6 1812/2… 10
## # … with 40 more rows
write.csv2(result(tsea_lincs_down), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210526_lincs_tsea.csv")
dsea_res_lincs <- dsea_hyperG(drugs = drugs_lincs_topdown, type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(dsea_res_lincs)
## # A tibble: 0 x 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## # BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## # Count <int>
#0 rows
drugs_lincs_topup <- unique(lincs_res_up$pert)[1:20]
gess_res_vis(lincs_res, drugs = drugs_lincs_topup , col = "NCS")
tsea_lincs_up <- tsea_dup_hyperG(drugs = drugs_lincs_topup[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs:
## brd-k90446118 / badge / salicin / brd-a93015352 / brd-k91047982 / brd-k33928422 / brd-k31412180 / dibutyrylcyclic-gmp / brd-k79779816 / brd-k42687792 / hydrocotarnine / tetroquinone / brd-k08198779
result(tsea_lincs_up)
## # A tibble: 114 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Calcium sig… 12/47 240/81… 7.14e-9 1.20e-6 9.09e-7 2263/22… 12
## 2 hsa04… Rap1 signal… 10/47 210/81… 2.45e-7 2.06e-5 1.56e-5 2263/22… 10
## 3 hsa04… PI3K-Akt si… 11/47 354/81… 3.96e-6 1.62e-4 1.23e-4 2263/22… 11
## 4 hsa04… MAPK signal… 10/47 294/81… 5.28e-6 1.62e-4 1.23e-4 2263/22… 10
## 5 hsa01… EGFR tyrosi… 6/47 79/8112 5.50e-6 1.62e-4 1.23e-4 2263/22… 6
## 6 hsa04… Ras signali… 9/47 232/81… 5.79e-6 1.62e-4 1.23e-4 2263/22… 9
## 7 hsa05… Central car… 5/47 70/8112 4.79e-5 1.08e-3 8.18e-4 2263/22… 5
## 8 hsa04… Adherens ju… 5/47 71/8112 5.14e-5 1.08e-3 8.18e-4 60/1457… 5
## 9 hsa04… Focal adhes… 7/47 201/81… 1.40e-4 2.62e-3 1.98e-3 2321/23… 7
## 10 hsa05… Prostate ca… 5/47 97/8112 2.27e-4 3.54e-3 2.68e-3 2263/51… 5
## # … with 104 more rows
write.csv2(result(tsea_lincs_up), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210526_lincs_up_tsea.csv" )
dsea_lincs_down <- dsea_hyperG(drugs = drugs_lincs_topdown, type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(dsea_lincs_down)
## # A tibble: 0 x 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## # BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## # Count <int>
GCMAP
The Bioconductor gCMAP (Sandmann et al. 2014) package provides access to a related but not identical implementation of the original CMAP algorithm proposed by Lamb et al. (2006). It uses as query a rank transformed GEP and the reference database is composed of the labels of up and down regulated DEG sets. This is the opposite situation of the CMAP method, where the query is composed of the labels of up and down regulated DEGs and the database contains rank transformed GESs.
First, create matrix with zscores
#changed LFC > 2 to 1.5, too few genes otherwise
DESEQ2_sig_df <- human_LFC[abs(as.numeric(human_LFC$log2FoldChange)) >1.5 & human_LFC$padj <0.05,]
LOG <- as.numeric(DESEQ2_sig_df$log2FoldChange)
IDS <- mapIds(EnsDb.Hsapiens.v75, keys= rownames(DESEQ2_sig_df) , column="ENTREZID", keytype="GENEID", multiVals="first")
## Warning: Unable to map 40 of 126 requested IDs.
names(IDS) <- NULL
LOG_V2 <- LOG[!is.na(IDS)]
IDS_V2 <-IDS[!is.na(IDS)]
duplicated_genes <- unique(IDS_V2[duplicated(IDS_V2)])
names(LOG_V2) <- IDS_V2
#fix<- c()
#for (i in 1:length(duplicated_genes)){
# group <- LOG_V2[names(LOG_V2) == duplicated_genes[i]]
# top <- group[group == max(abs(group))]
# fix <- c(fix, top)
#}
#fix
#LOG_V3 <- LOG_V2[!names(LOG_V2) %in% duplicated_genes]
#LOG_V4 <- c(LOG_V3, fix)
#LOG_V5 <- as.data.frame(LOG_V4)
#rownames(LOG_V5) <- names(LOG_V4)
#duplicated_genes is zero
LOG_V5 <- as.data.frame(LOG_V2)
rownames(LOG_V5) <- names(LOG_V2)
qsig_gcmap <- qSig(query = as.matrix(LOG_V5), gess_method = "gCMAP", refdb = "lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
gcmap <- gess_gcmap(qsig_gcmap, higher = 1, lower = -1)
result(gcmap)
## # A tibble: 45,956 x 10
## pert PCID cell type trend effect nSet nFound signed t_gn_sym
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <lgl> <chr>
## 1 capsaze… 27334… HCC515 trt_… up 1 777 2 TRUE TRPV1; TRPV4
## 2 BRD-K95… 53107… A549 trt_… down -1 936 2 TRUE <NA>
## 3 hydroco… 3646 HT29 trt_… down -0.994 988 2 TRUE <NA>
## 4 ZM-39923 3797 PC3 trt_… down -0.994 458 2 TRUE <NA>
## 5 BRD-K65… 28571… NEU trt_… down -0.994 1547 2 TRUE <NA>
## 6 RU-24969 108029 ASC trt_… up 0.994 1278 2 TRUE HTR1B; HTR1D; …
## 7 BRD-A31… 29988… PC3 trt_… up 0.994 525 2 TRUE <NA>
## 8 gingerol 442793 NEU trt_… down -0.994 590 2 TRUE <NA>
## 9 austric… 64199… HA1E trt_… up 0.988 1791 3 TRUE <NA>
## 10 BRD-K58… 31076… VCAP trt_… up 0.988 446 2 TRUE <NA>
## # … with 45,946 more rows
The columns in the corresponding search result table, that are specific to the gCMAP method, contain the following information. effect: scaled bi-directional enrichment score corresponding to the scaled_score under the CMAP result; nSet: number of genes in the reference gene sets after applying the higher and lower cutoff; nFound: number of genes in the reference gene sets that are present in the query signature; signed: whether the gene sets in the reference database have signs, e.g. representing up and down regulated genes when computing scores.
gcmap_res <- result(gcmap)
write.csv2(gcmap_res, file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_gcmap_res_human.csv")
drugs_top20_gcmap <- c(unique(gcmap_res$pert)[1:20])
gess_res_vis(gcmap_res, drugs = drugs_top20_gcmap, col = "effect")
gcmap_res_down <- gcmap_res[gcmap_res$trend == "down",]
down_drug_gcmap <- unique(gcmap_res_down$pert)[1:20]
gcmap_res_up <- gcmap_res[gcmap_res$trend == "up",]
up_drug_gcmap <- unique(gcmap_res_up$pert)[1:20]
drugs_top20 <- c(unique(gcmap_res_down$pert)[1:20], "temozolomide")
gess_res_vis(gcmap_res, drugs = drugs_top20_gcmap, col = "effect")
TSEA for gCMAP
dup_rct_res_gcmap_down <- tsea_dup_hyperG(drugs = drugs_top20_gcmap[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs:
## brd-k95547711 / hydrocotarnine / zm-39923 / brd-k65050353 / brd-a31429689 / gingerol / austricine / brd-k58628167 / brd-k85174275 / brd-k59036917 / brd-k42867405 / brd-k02641134 / mw-stk33-3a
result(dup_rct_res_gcmap_down)
## # A tibble: 27 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Neuroactiv… 11/15 341/81… 7.26e-13 1.96e-11 1.22e-11 7442/3… 11
## 2 hsa04… Serotonerg… 5/15 115/81… 1.41e- 6 1.90e- 5 1.18e- 5 3351/3… 5
## 3 hsa04… Calcium si… 6/15 240/81… 2.52e- 6 2.27e- 5 1.42e- 5 3357/3… 6
## 4 hsa04… cAMP signa… 4/15 216/81… 5.30e- 4 3.58e- 3 2.23e- 3 3351/3… 4
## 5 hsa04… Inflammato… 3/15 98/8112 7.00e- 4 3.78e- 3 2.36e- 3 7442/5… 3
## 6 hsa00… Arachidoni… 2/15 61/8112 5.48e- 3 2.47e- 2 1.54e- 2 6916/5… 2
## 7 hsa04… Gastric ac… 2/15 76/8112 8.41e- 3 3.24e- 2 2.02e- 2 887/25… 2
## 8 hsa04… Taste tran… 2/15 86/8112 1.07e- 2 3.34e- 2 2.09e- 2 3351/3… 2
## 9 hsa04… Gap juncti… 2/15 88/8112 1.11e- 2 3.34e- 2 2.09e- 2 3357/1… 2
## 10 hsa04… Pancreatic… 2/15 102/81… 1.48e- 2 3.99e- 2 2.49e- 2 5319/8… 2
## # … with 17 more rows
write.csv2(result(dup_rct_res_gcmap_down), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_gcmap_tsea_down_human.csv")
drugs_top20_gcmap_up <- c(unique(gcmap_res_up$pert)[1:20])
gess_res_vis(gcmap_res, drugs = drugs_top20_gcmap_up, col = "effect")
dup_rct_res_gcmap_up <- tsea_dup_hyperG(drugs = drugs_top20_gcmap_up[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs:
## brd-a31429689 / austricine / brd-k58628167 / brd-k85174275 / brd-k59036917 / brd-k62719675 / cortisone-acetate / brd-k76357752 / brd-k00551481 / brd-k33452357 / brd-k11778372 / brd-k26359746 / brd-k66703091
result(dup_rct_res_gcmap_up)
## # A tibble: 56 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Adrenergic… 15/38 150/81… 5.34e-17 3.21e-15 1.29e-15 148/14… 15
## 2 hsa05… Arrhythmog… 12/38 77/8112 4.88e-16 1.46e-14 5.90e-15 775/77… 12
## 3 hsa04… Cardiac mu… 12/38 87/8112 2.28e-15 3.50e-14 1.41e-14 775/77… 12
## 4 hsa04… Oxytocin s… 14/38 154/81… 2.84e-15 3.50e-14 1.41e-14 775/77… 14
## 5 hsa04… MAPK signa… 17/38 294/81… 2.94e-15 3.50e-14 1.41e-14 773/77… 17
## 6 hsa05… Hypertroph… 12/38 90/8112 3.50e-15 3.50e-14 1.41e-14 775/77… 12
## 7 hsa05… Dilated ca… 12/38 96/8112 7.83e-15 6.71e-14 2.71e-14 775/77… 12
## 8 hsa04… Calcium si… 15/38 240/81… 6.36e-14 4.77e-13 1.93e-13 3357/3… 15
## 9 hsa04… Serotonerg… 11/38 115/81… 2.50e-12 1.66e-11 6.72e-12 3351/3… 11
## 10 hsa04… GnRH secre… 8/38 64/8112 3.88e-10 2.33e- 9 9.40e-10 775/77… 8
## # … with 46 more rows
write.csv2(result(dup_rct_res_gcmap_up), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_gcmap_tsea_up_human.csv")
hyperG_k_res_gcmap_up <- dsea_hyperG(drugs = drugs_top20_gcmap_up[1:20], type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_gcmap_up)
## # A tibble: 0 x 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## # BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## # Count <int>
#0 results
dtnetplot(drugs = drugs(hyperG_k_res_gcmap_up), set = "hsa04921",
desc="Oxytocin signaling pathway")
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs:
## brd-a31429689 / austricine / brd-k58628167 / brd-k85174275 / brd-k59036917 / brd-k62719675 / cortisone-acetate / brd-k76357752 / brd-k00551481 / brd-k33452357 / brd-k11778372 / brd-k26359746 / brd-k66703091
Fisher Method
Fisher’s exact test (Graham J. G. Upton 1992) can also be used to search a GS-DB (gene set database) for entries that are similar to a GS-Q (gene set query). In this case both the query and the database are composed of gene label sets, such as DEG sets.
qsig_fisher <- qSig(query = as.matrix(LOG_V5), gess_method = "Fisher", refdb = "lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
fisher <- gess_fisher(qSig = qsig_fisher, higher = 1, lower = -1)
result(fisher)[1:50,]
## # A tibble: 50 x 13
## pert PCID cell type trend pval padj effect LOR nSet nFound signed
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
## 1 SA-1… 5465… NKDBA trt_… over 2.54e-6 0.0127 4.71 1.62 1806 18 FALSE
## 2 BRD-… 5664… PC3 trt_… over 3.01e-6 0.0150 4.67 1.54 2475 21 FALSE
## 3 cycl… 1640… PHH trt_… over 7.14e-6 0.0357 4.49 1.50 4468 28 FALSE
## 4 QL-X… 7370… HS57… trt_… over 1.91e-5 0.0478 4.27 1.44 4657 28 FALSE
## 5 K784… 4381… VCAP trt_… under 6.59e-5 0.0843 -3.99 -1.32 9826 20 FALSE
## 6 BRD-… 5333… PC3 trt_… over 6.74e-5 0.0843 3.99 1.71 732 10 FALSE
## 7 BRD-… 7816… MCF7 trt_… over 5.28e-5 0.0843 4.04 1.34 3640 24 FALSE
## 8 BRD-… 7632… MCF7 trt_… over 1.93e-5 0.0963 4.27 1.41 2530 20 FALSE
## 9 BRD-… 5664… PC3 trt_… over 1.14e-4 0.114 3.86 2.21 247 6 FALSE
## 10 HDAC… NotF… NEU trt_… over 4.40e-4 0.122 3.52 1.26 1701 14 FALSE
## # … with 40 more rows, and 1 more variable: t_gn_sym <chr>
The columns in the result table specific to the Fisher method include the following information. pval: p-value of the Fisher’s exact test; padj: p-value adjusted for multiple hypothesis testing using R’s p.adjust function with the Benjamini & Hochberg (BH) method; effect: z-score based on the standard normal distribution; LOR: log odds ratio.
write.csv2(result(fisher), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_fisher_res_human.csv")
fisher_res <- result(fisher)
drugs_top20_fisher <- c(unique(fisher_res$pert)[1:20])
gess_res_vis(fisher_res, drugs = drugs_top20_fisher, col = "effect")
fisher_res_over <- fisher_res[fisher_res$trend=="over", ]
up_drug_fisher <- unique(fisher_res_over$pert)[1:20]
fisher_res_under <- fisher_res[fisher_res$trend=="under", ]
down_drug_fisher <- unique(fisher_res_under$pert)[1:20]
drugs_top20_fisher_over <- c(unique(fisher_res_over$pert)[1:20])
gess_res_vis(fisher_res, drugs = drugs_top20_fisher_over, col = "effect")
dup_rct_res_fisher_over <- tsea_dup_hyperG(drugs = drugs_top20_fisher_over[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 13 drugs:
## sa-1472623 / brd-k23980805 / brd-k30836161 / brd-k53932786 / brd-k50659736 / brd-k87371039 / hdac1-selective / isonicotinohydroxamic-acid / brd-k89097220 / cephalosporanic-acid / brd-k26767475 / brd-k86948282 / brd-k96704748
result(dup_rct_res_fisher_over)
## # A tibble: 98 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Serotonerg… 17/57 115/81… 9.75e-19 1.51e-16 1.02e-16 773/77… 17
## 2 hsa04… Calcium si… 15/57 240/81… 5.49e-11 4.26e- 9 2.86e- 9 10105/… 15
## 3 hsa04… MAPK signa… 16/57 294/81… 8.94e-11 4.62e- 9 3.11e- 9 5530/5… 16
## 4 hsa04… Adrenergic… 12/57 150/81… 3.55e-10 1.38e- 8 9.26e- 9 147/14… 12
## 5 hsa04… Oxytocin s… 11/57 154/81… 7.03e- 9 2.18e- 7 1.46e- 7 5530/5… 11
## 6 hsa05… Chemical c… 8/57 69/8112 2.15e- 8 5.54e- 7 3.73e- 7 1577/1… 8
## 7 hsa04… Type II di… 7/57 46/8112 2.50e- 8 5.55e- 7 3.73e- 7 2475/7… 7
## 8 hsa05… Arrhythmog… 8/57 77/8112 5.17e- 8 1.00e- 6 6.73e- 7 775/77… 8
## 9 hsa04… Taste tran… 8/57 86/8112 1.24e- 7 2.11e- 6 1.42e- 6 773/77… 8
## 10 hsa04… Cardiac mu… 8/57 87/8112 1.36e- 7 2.11e- 6 1.42e- 6 775/77… 8
## # … with 88 more rows
write.csv2(result(dup_rct_res_fisher_over), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_fisher_tsea_over_human.csv")
hyperG_k_res_fisher_over <- dsea_hyperG(drugs = drugs_top20_fisher_over[1:20], type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_fisher_over)
## # A tibble: 20 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa05… Amyotrophic … 4/7 1174/2… 3.26e-4 0.0185 0.0117 cyclosp… 4
## 2 hsa05… Huntington d… 4/7 1185/2… 3.38e-4 0.0185 0.0117 cyclosp… 4
## 3 hsa05… Parkinson di… 4/7 1222/2… 3.80e-4 0.0185 0.0117 cyclosp… 4
## 4 hsa05… Prion disease 4/7 1382/2… 6.10e-4 0.0223 0.0141 cyclosp… 4
## 5 hsa00… Retinol meta… 3/7 732/20… 1.43e-3 0.0305 0.0194 cyclosp… 3
## 6 hsa05… Chemical car… 3/7 732/20… 1.43e-3 0.0305 0.0194 cyclosp… 3
## 7 hsa04… Long-term po… 3/7 777/20… 1.69e-3 0.0305 0.0194 cyclosp… 3
## 8 hsa02… ABC transpor… 2/7 199/20… 1.91e-3 0.0305 0.0194 cyclosp… 2
## 9 hsa00… Drug metabol… 3/7 853/20… 2.22e-3 0.0305 0.0194 cyclosp… 3
## 10 hsa00… Metabolism o… 3/7 869/20… 2.34e-3 0.0305 0.0194 cyclosp… 3
## 11 hsa00… Steroid horm… 3/7 906/20… 2.63e-3 0.0305 0.0194 cyclosp… 3
## 12 hsa05… Spinocerebel… 3/7 906/20… 2.63e-3 0.0305 0.0194 cyclosp… 3
## 13 hsa05… Amphetamine … 3/7 916/20… 2.72e-3 0.0305 0.0194 cyclosp… 3
## 14 hsa05… Herpes simpl… 3/7 1122/2… 4.85e-3 0.0491 0.0311 ql-x-13… 3
## 15 hsa04… Oxytocin sig… 3/7 1148/2… 5.17e-3 0.0491 0.0311 cyclosp… 3
## 16 hsa04… Synaptic ves… 2/7 357/20… 5.99e-3 0.0491 0.0311 verapam… 2
## 17 hsa04… Growth hormo… 3/7 1217/2… 6.10e-3 0.0491 0.0311 ql-x-13… 3
## 18 hsa04… Cellular sen… 3/7 1248/2… 6.55e-3 0.0491 0.0311 cyclosp… 3
## 19 hsa01… Endocrine re… 3/7 1259/2… 6.71e-3 0.0491 0.0311 cyclosp… 3
## 20 hsa04… Non-alcoholi… 3/7 1260/2… 6.72e-3 0.0491 0.0311 verapam… 3
drugs_top20_fisher_under <- c(unique(fisher_res_under$pert)[1:20])
gess_res_vis(fisher_res, drugs = drugs_top20_fisher_under, col = "effect")
dup_rct_res_fisher_under <- tsea_dup_hyperG(drugs = drugs_top20_fisher_under[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 15 drugs:
## k784-3131 / oxetane / brd-k32111336 / brd-k65134192 / brd-k04722205 / bg-1021 / brd-k53932786 / ran-14b / brd-k66448556 / brd-k74606027 / gnf-2 / brd-k10806285 / ku-c103655 / brd-k41284916 / brd-k96194181
result(dup_rct_res_fisher_under)
## # A tibble: 182 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Dopaminerg… 11/30 132/81… 5.84e-13 1.09e-10 4.30e-11 7054/1… 11
## 2 hsa04… Fc epsilon… 7/30 68/8112 3.71e- 9 2.83e- 7 1.12e- 7 207/24… 7
## 3 hsa04… Prolactin … 7/30 70/8112 4.56e- 9 2.83e- 7 1.12e- 7 7054/2… 7
## 4 hsa05… Yersinia i… 8/30 137/81… 2.31e- 8 1.07e- 6 4.25e- 7 207/29… 8
## 5 hsa00… Folate bio… 5/30 26/8112 3.04e- 8 1.13e- 6 4.48e- 7 5053/5… 5
## 6 hsa04… T cell rec… 7/30 104/81… 7.42e- 8 2.30e- 6 9.12e- 7 207/29… 7
## 7 hsa05… Toxoplasmo… 7/30 112/81… 1.24e- 7 3.30e- 6 1.31e- 6 4843/2… 7
## 8 hsa04… Sphingolip… 7/30 119/81… 1.89e- 7 3.90e- 6 1.54e- 6 207/56… 7
## 9 hsa04… Growth hor… 7/30 119/81… 1.89e- 7 3.90e- 6 1.54e- 6 207/29… 7
## 10 hsa04… Osteoclast… 7/30 128/81… 3.11e- 7 5.55e- 6 2.20e- 6 5468/2… 7
## # … with 172 more rows
qsig_sp <- qSig(query = as.matrix(LOG_V5), gess_method = "Cor", refdb = "lincs")
## see ?signatureSearchData and browseVignettes('signatureSearchData') for documentation
## loading from cache
sp <- gess_cor(qSig=qsig_sp, method="spearman")
result(sp)[1:50,]
## # A tibble: 50 x 7
## pert PCID cell type trend cor_score t_gn_sym
## <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 galanta… 9651 HT29 trt_… down -0.618 ACHE; BCHE; CHRNA1; CHRNA10; CH…
## 2 STK-064… 2878746 PC3 trt_… up 0.573 <NA>
## 3 BRD-K42… 546384… NPC trt_… down -0.552 <NA>
## 4 BRD-K56… 546665… PC3 trt_… down -0.546 <NA>
## 5 BRD-K91… 445014… A375 trt_… down -0.540 <NA>
## 6 BRD-K71… 107410 FIBR… trt_… down -0.533 <NA>
## 7 salbuta… 2083 VCAP trt_… up 0.530 ADCY10; ADRB1; ADRB2; ADRB3
## 8 BRD-K48… 546501… NPC trt_… up 0.529 <NA>
## 9 BRD-K90… 498358… VCAP trt_… up 0.526 <NA>
## 10 BRD-K93… 6404603 A375 trt_… down -0.523 <NA>
## # … with 40 more rows
sp_res <- result(sp)
write.csv2(result(sp), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_spearman_res_human.csv")
drugs_top20_spearman <- c(unique(sp_res$pert)[1:20])
gess_res_vis(sp_res, drugs = drugs_top20_spearman, col = "cor_score")
sp_res_up <- sp_res[sp_res$trend == "up",]
up_drug_sp <- unique(sp_res_up$pert)[1:20]
sp_res_down <- sp_res[sp_res$trend == "down",]
down_drug_sp <- unique(sp_res_down$pert)[1:20]
drugs_top20_sp_down <- c(unique(sp_res_down$pert)[1:20])
gess_res_vis(sp_res, drugs = drugs_top20_sp_down, col = "cor_score")
dup_rct_res_sp_down <- tsea_dup_hyperG(drugs = drugs_top20_sp_down[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 14 drugs:
## brd-k42743267 / brd-k56450018 / brd-k91063002 / brd-k71339761 / brd-k93623754 / brd-k67306351 / brd-k81408913 / brd-k99874282 / brd-k59488055 / cercosporin / brd-k52099002 / brd-k28162668 / brd-k97522823 / brd-k84024786
result(dup_rct_res_sp_down)
## # A tibble: 51 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Neuroactiv… 29/58 341/81… 3.75e-25 4.49e-23 3.63e-23 1134/5… 29
## 2 hsa05… Nicotine a… 9/58 40/8112 5.89e-12 3.53e-10 2.85e-10 1137/8… 9
## 3 hsa04… Insulin se… 11/58 86/8112 1.50e-11 4.61e-10 3.72e-10 3778/3… 11
## 4 hsa04… Cholinergi… 12/58 113/81… 1.54e-11 4.61e-10 3.72e-10 43/113… 12
## 5 hsa05… Chemical c… 12/58 212/81… 2.30e- 8 5.53e- 7 4.46e- 7 1136/1… 12
## 6 hsa04… cGMP-PKG s… 9/58 167/81… 2.40e- 6 4.80e- 5 3.87e- 5 4985/3… 9
## 7 hsa04… Leukocyte … 7/58 114/81… 1.50e- 5 2.46e- 4 1.99e- 4 1535/1… 7
## 8 hsa05… Leishmania… 6/58 77/8112 1.64e- 5 2.46e- 4 1.99e- 4 1535/1… 6
## 9 hsa05… Diabetic c… 8/58 203/81… 8.77e- 5 1.17e- 3 9.44e- 4 1535/1… 8
## 10 hsa05… Prion dise… 9/58 273/81… 1.22e- 4 1.46e- 3 1.18e- 3 1535/1… 9
## # … with 41 more rows
hyperG_k_res_sp_down<- dsea_hyperG(drugs = drugs_top20_sp_down[1:20], type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_sp_down)
## # A tibble: 4 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa04… Pancreatic … 4/6 857/20… 4.25e-5 0.00472 0.00430 dextrome… 4
## 2 hsa04… cGMP-PKG si… 4/6 1509/2… 3.89e-4 0.0169 0.0154 dextrome… 4
## 3 hsa04… Insulin sec… 3/6 595/20… 4.56e-4 0.0169 0.0154 miconazo… 3
## 4 hsa04… Salivary se… 3/6 884/20… 1.45e-3 0.0402 0.0366 miconazo… 3
drugs_top20_sp_up <- c(unique(sp_res_up$pert)[1:20])
gess_res_vis(sp_res, drugs = drugs_top20_sp_up, col = "cor_score")
dup_rct_res_sp_up <- tsea_dup_hyperG(drugs = drugs_top20[1:20], type = "KEGG",
pvalueCutoff=0.5, qvalueCutoff = 0.5)
## No targets found in DrugBank/CLUE/STITCH database for 10 drugs:
## brd-k95547711 / hydrocotarnine / zm-39923 / brd-k65050353 / gingerol / brd-k42867405 / brd-k02641134 / mw-stk33-3a / brd-k64106162 / brd-k12393722
result(dup_rct_res_sp_up)
## # A tibble: 196 x 9
## ID Description GeneRatio BgRatio pvalue p.adjust qvalue itemID Count
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int>
## 1 hsa05… PD-L1 expr… 12/50 89/8112 1.23e-13 2.49e-11 8.39e-12 7535/3… 12
## 2 hsa05… Yersinia i… 11/50 137/81… 4.53e-10 4.60e- 8 1.55e- 8 2185/7… 11
## 3 hsa05… Hepatitis B 11/50 162/81… 2.74e- 9 1.85e- 7 6.24e- 8 1017/2… 11
## 4 hsa04… Prolactin … 8/50 70/8112 8.21e- 9 4.17e- 7 1.40e- 7 3717/2… 8
## 5 hsa04… T cell rec… 9/50 104/81… 1.07e- 8 4.34e- 7 1.46e- 7 7535/2… 9
## 6 hsa04… Growth hor… 9/50 119/81… 3.51e- 8 1.19e- 6 4.00e- 7 3717/2… 9
## 7 hsa05… Human immu… 11/50 212/81… 4.58e- 8 1.33e- 6 4.48e- 7 983/21… 11
## 8 hsa05… Lipid and … 11/50 215/81… 5.29e- 8 1.34e- 6 4.53e- 7 1559/3… 11
## 9 hsa04… Osteoclast… 9/50 128/81… 6.63e- 8 1.48e- 6 5.00e- 7 7297/2… 9
## 10 hsa04… Th1 and Th… 8/50 92/8112 7.31e- 8 1.48e- 6 5.00e- 7 7535/3… 8
## # … with 186 more rows
write.csv2(result(dup_rct_res_sp_up), file = "/Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/210604_spearman_res_human.csv")
hyperG_k_res_sp_up <- dsea_hyperG(drugs = drugs_top20_sp_up[1:20], type = "KEGG",
pvalueCutoff = 0.05 , qvalueCutoff = 0.05,
minGSSize = 2, maxGSSize = 2000)
result(hyperG_k_res_sp_up)
## # A tibble: 0 x 9
## # … with 9 variables: ID <chr>, Description <chr>, GeneRatio <chr>,
## # BgRatio <chr>, pvalue <dbl>, p.adjust <dbl>, qvalue <dbl>, itemID <chr>,
## # Count <int>
intersect(up_drug_cmap, up_drug_lincs)
## character(0)
intersect(up_drug_cmap, up_drug_gcmap)
## character(0)
intersect(up_drug_cmap, up_drug_sp)
## character(0)
intersect(up_drug_cmap, up_drug_fisher)
## character(0)
intersect(up_drug_lincs, up_drug_gcmap)
## character(0)
intersect(up_drug_lincs, up_drug_sp)
## character(0)
intersect(up_drug_lincs, up_drug_fisher)
## character(0)
intersect(up_drug_gcmap, up_drug_sp)
## character(0)
intersect(up_drug_gcmap, up_drug_fisher)
## character(0)
intersect(up_drug_sp, up_drug_fisher)
## [1] "HDAC1-selective"
Location of final scripts: Locally /Users/eramsey/Desktop/R21_210302/PreliminaryR21/signatureSearch_outputs/
Operations complete: 210604
Versions sessionInfo()